Introduction

Website

CSV

My two youtube channel I selected are @LEMMiNO and @Rousseau

Lemmino Link

Rousseau Link

I Choose this Rousseau mainly because its a youtube channel about piano music (it covers from classical, pop, to almost every genre) and I listen to his piano cover a lot when studying because its soothing and helps me focus. For Lemmino, I followed his channel back it was from 2012 and it has significantly improve in terms of quality (content and production). Essentialy this channel is about informative content across various topics. I highly recommend visit his channel and watch some of his video if you have not.

Two ideas before accessed the data:

  1. Subscriber Count between both channel across the year
  2. Plot a time series graph showing the number of views, likes, and comments over time for each channel. Use line graphs with different colors representing each metric for each channel.

Summarisation of data process:

I decided to use features such as comments, views, and likes counts from the dataset, as they are crucial for evaluating the performance of a YouTube channel. I chose to employ line charts for comparing numerical variables over time, such as views vs years, as they effectively illustrate trends and patterns in continuous data. On the other hand, for comparisons involving numerical and categorical variables, such as categorizing videos into popular and non-popular, I utilised bar charts. Bar charts provide a clear visual representation of comparisons between different categories, making them suitable for conveying differences or relationships between numerical and categorical data.

However, one of the charts I attempted, illustrating the channel metrics over time, did not yield desired results. For reference, here is the chart:

fail_chart.png
fail_chart.png

The plot displayed a significant disparity between views count and the rest of the metrics (likes and comments), resulting in an offset beyond millions, rendering it ineffective for analysis.

Dynamic data story

data_story.gif
data_story.gif

Creativity

For creativity i include more than 3 plots to my data story. This increase the overall complexity and add mores story to it. We can simplify the complexity of the code by combining multiple variables and levels into one straightforward plot, thus, enhancing interpretability (ie complex code, simple graph). I also added a few sentence to the graph to provide more insight and understanding of the data.

For color coding, I mainly used red and white to symbolize youtube main color, ensuring contextual relevance. I also uses blue as a contrast to the red to enhance visual distinction. For the CSS, I use light color to use a more vibrant color, which is the theme of this project. The design structure is framed in a tidier approach, with CSS elements enhancing the overall presentation by incorporating borders, varied fonts, and colors to emphasize headings and improve visual appeal.

As per usual, I uploaded the index.html to github pages and updated the link, as well as updating the markdown and upload all source code to github.

I also added the preview url as an additonal GIF.

thumbnail.gif
thumbnail.gif

Learning reflection

One of the most important ideas that I learned from this assignment is the significance of both static and dynamic visualizations. Static visualizations offer a straightforward approach by providing easy access to data without the need for additional code processing to visualize the data. They allow for quick understanding of patterns or trends in the data. Whereas, dynamic visualisations offer an interactive and engaging way to explore data, better conveying the relationships between variables or groups within the data. By incorporating features like grouping, dynamic visualisations can provide deeper insights and enhance understanding of complex datasets. In conclusion, understanding the significance of both static and dynamic visualizations offers flexibility: static visuals enable quick data comprehension upfront, while dynamic ones support deeper insights as analysis progresses.

For future exploration, I would like to understand more about the interactivity of plots and discovering different chart types to enhance information visualisation. Understanding how to make plots interactive can greatly improve engagement and understanding, allowing users to explore data dynamically. In additional, exploring various chart types beyond the traditional ones like bar and line chart can offer unique ways to present data, potentially revealing insights that may not be as apparent with standard charts. Delving deeper into these areas will definitely expand my understanding of data visualisation. ## Appendix

library(tidyverse)
library(dplyr)
library(jsonlite)
library(magick)

youtube_data <- data.frame(read.csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vRAk1VTiPEDf6H_bk442dTuh3CTLgjyd7HDgh-QhlkyWs-onXy8_ZnOp2i_BxgM0hIYx_gTB-rfw4fT/pub?output=csv"))

view(youtube_data)

youtube_data <- youtube_data %>%
    mutate(
        yearReleased = datePublished %>% str_sub(1, 4) %>% parse_number()
    )

viewCountsPerYear <- youtube_data %>%
    group_by(yearReleased, channelName) %>%
    summarise(totalViewCount = sum(viewCount))

view(viewCountsPerYear)

# Alternative chart 1

# viewCountsPerYear %>%
#     ggplot(aes(x = yearReleased, y = totalViewCount, color = "green")) +
#     geom_line() +
#     labs(x = "Year Released", y = "View Count", title = "View Counts Over the Years") +
#     theme_minimal() +
#     scale_y_continuous(labels = scales::comma) +
#     facet_wrap(vars(channelName))


# Line plot 1 (Compare the ChannelName total viewCount by the year)
viewCountsPerYear %>%
    ggplot(aes(x = yearReleased, y = totalViewCount, color = channelName)) +
    geom_line(size = 2) +
    labs(x = "Year Released", y = "View Count", title = "View Counts Over the Years") +
    theme_bw() +
    scale_y_continuous(labels = scales::comma) +
    scale_x_continuous(breaks = seq(2012, 2024, 2))
ggsave("plot1.png")


videoPerYear <- youtube_data %>%
    mutate(yearReleased = datePublished %>% str_sub(1, 4) %>% parse_number()) %>%
    group_by(channelName, yearReleased) %>%
    summarise(totVideos = n())
view(videoPerYear)

# Bar chart 2 (Total number of videos by the year)
videoPerYear %>%
    ggplot() +
    geom_bar(
        aes(
            x = yearReleased,
            y = totVideos,
            fill = channelName
        ),
        stat = "identity",
        position = "dodge",
        width = 0.7
    ) +
    theme_bw() +
   labs(x = "Year Released", y = "Total Video", title = "Total Videos Over the Years") +
    scale_x_continuous(breaks = seq(2012, 2024, 2))

ggsave("plot2.png")

# Bar Chart 3 (Comparison of total popular video respective to their channel)

# Finding the Mean for view count, like count, comment count (L = @LEMMiNO, R = @Rousseau)
LMean <- youtube_data %>%
    filter(channelName == "@LEMMiNO") %>%
    summarise(viewCount = mean(viewCount), likeCount = mean(likeCount), commentCount = mean(commentCount))
LMean

RMean <- youtube_data %>%
    filter(channelName == "@Rousseau") %>%
    summarise(viewCount = mean(viewCount), likeCount = mean(likeCount), commentCount = mean(commentCount))

# Requirement for a video to be popular (3 variables need to above the mean respective to their channel name)

# @LEMMiNO popolar grouping
Lpopular <- youtube_data %>%
    filter(channelName == "@LEMMiNO") %>%
    mutate(
        popular = ifelse(viewCount >= LMean$viewCount | likeCount >= LMean$likeCount | commentCount >= LMean$commentCount,
            "Popular", "Not Popular"
        )
    )
Lpopular
# @Rosseau popolar grouping
Rpopular <- youtube_data %>%
    filter(channelName == "@Rousseau") %>%
    mutate(
        popular = ifelse(viewCount >= RMean$viewCount | likeCount >= RMean$likeCount | commentCount >= RMean$commentCount,
            "Popular", "Not Popular"
        )
    )

# Combined the datasets
popularData <- bind_rows(Lpopular, Rpopular)

popularData <- popularData %>%
    group_by(channelName) %>%
    summarise(popular_count = sum(popular == "Popular", na.rm = TRUE))
view(popularData)

# Label
label <- c("44 out of 100 videos are popular", "34 out of 100 vides are popular")

ggplot(popularData, aes(x = channelName, y = popular_count, fill = channelName)) +
    geom_bar(stat = "identity") +
    labs(x = "Channel Name", y = "Popular Count", title = "Popular Count by Channel") +
    geom_text(aes(label = label), position = "stack", vjust = -0.5, size = 3) +
    theme_bw()
ggsave("plot3.png")



# Bar plot 4 (Count the word occurance in the title column)
title_word_counts <- youtube_data %>%
    select(title) %>%
    separate_rows(title, sep = " ") %>%
    mutate(clean_word = str_to_lower(title) %>%
        str_remove_all("[[:punct:]]")) %>%
    filter(
        !clean_word == ""
    ) %>%
    group_by(clean_word) %>%
    summarise(n = n()) %>%
    arrange(desc(n)) %>%
    slice(1:10) %>%
    ungroup()

title_word_counts %>%
    ggplot(aes(
        x = reorder(clean_word, n),
        y = n
    )) +
    geom_col(fill = "lightblue") +
    geom_text(aes(label = clean_word),
        colour = "lightblue",
        size = 8,
        position = position_nudge(y = 3)
    ) +
    geom_text(aes(label = n),
        position = position_nudge(y = -1),
        colour = "black",
        size = 6
    ) + theme_bw() +
  theme(panel.grid = element_blank(),
        axis.text.x = element_blank(),
        axis.text.y = element_blank()) +
    labs(
        x = "Word",
        y = "Word Occurance ",
        title="Top 10 most used word in title"
    ) 
ggsave(("plot4.png"))


ggplot(youtube_data, aes(x = yearReleased)) +
    geom_line(aes(y = viewCount, color = "Views")) +
    geom_line(aes(y = likeCount, color = "Likes")) +
    geom_line(aes(y = commentCount, color = "Comments")) +
    labs(x = "Date", y = "Count", color = "Metric", title = "Number of Views, Likes, and Comments Over Time") +
    scale_color_manual(values = c("Views" = "blue", "Likes" = "green", "Comments" = "red")) +
    scale_y_continuous(labels = scales::comma) +
    scale_x_continuous(breaks = seq(2012, 2024, 2)) +
    theme_bw()
ggsave(("fail_chart.png"))


thumbnail <- youtube_data$thumbnailUrl %>% na.omit()

image_read(thumbnail) %>%
  image_join() %>%
  image_scale(500) %>%
  image_animate(fps = 1) %>%
  image_write("thumbnail.gif")
library(tidyverse)
library(dplyr)
library(jsonlite)
library(magick)


# Slide 1
frame1 <- image_blank(1200, 400, "#B20000") %>%
    image_annotate("YouTube Engagement Over Time For @LEMMiNO and @Rousseau",
        color = "#FFFFFF",
        size = 38,
        font = "sans",
        gravity = "Center"
    )
frame1

slide2 <- image_read("plot1.png") %>%
    image_scale("600x400!")
slide2Text <- image_blank(600, 400, "#B20000") %>%
    image_annotate("
    
    
  This chart shows each channel views count over the year
  
  1. There are instances where @Rousseau's videos outperform 
      @LEMMiNO's, indicating differences in content popularity 
      or audience engagement strategies.
  
  2. Both creators experienced fluctuations in view counts 
      across the years, suggesting the influence of various 
      factors such as video topics, trends, and algorithm changes.
                   ",
        color = "#FFFFFF",
        size = 20,
        font = "Trebuchet",
   
    )
frame2 = c(slide2, slide2Text) %>%
  image_append(stack = FALSE)
frame2


slide3 <- image_read("plot2.png") %>%
  image_scale("600x400!")

slide3Text <- image_blank(600, 400, "#B20000") %>%
  image_annotate("
  
  
  This chart show the video count over the year for 
  their respective channel
  
  1. The total video count for @LEMMiNO shows a 
      general decreasing trend over the years, 
      indicating a change in production style.
    
  2. In contrast, @Rousseau's video uploads appear 
      less consistent, with intermittent 
      years of video releases. However, there's a 
      trend of increasing video production from 2017 onwards.
                 ",
                 color = "#FFFFFF",
                 size = 20,
                 font = "Trebuchet"
  )
frame3 = c(slide3, slide3Text) %>%
  image_append(stack = FALSE)
frame3


slide4 <- image_read("plot3.png") %>%
  image_scale("600x400!")

slide4Text <- image_blank(600, 400, "#B20000") %>%
  image_annotate("
  
  
  
  This chart shows the popular video count. To qualify as popular, 
  a video's views, comments, or likes count must equal or exceed 
  the mean of each respective variable within the video channel.  
      
  1. For @LEMMiNO 44 out of 100 videos are popular.
  
  2. For @Rousseau's 34 out of 100 videos are popular. 
                 ",
                 color = "#FFFFFF",
                 size = 20,
                 font = "Trebuchet"
  )
frame4 = c(slide4, slide4Text) %>%
  image_append(stack = FALSE)
frame4

slide5 <- image_read("plot4.png") %>%
  image_scale("600x400!")

slide5Text <- image_blank(600, 400, "#B20000") %>%
  image_annotate("


  
  This chart shows the letter occurance from the youtube 
  video title
  
  1. Top 2 words are piano and version, this is probably 
      from @Rousseau videos since most of videos 
      consist of 'piano' or 'piano version'
      
  2. Some of the words 'part' are from @LEMMiNO because 
      most of his videos are seperated into parts
                 ",
                 color = "#FFFFFF",
                 size = 20,
                 font = "Trebuchet"
  )
frame5 = c(slide5, slide5Text) %>%
  image_append(stack = FALSE)
frame5


frame6 <- image_blank(1200, 400, "#B20000") %>%
    image_annotate("

  Overall, I learned that analyzing data trends can provide valuable insights 
  into the performance and growth of content creators on platforms like YouTube. 
  
  For instance, developing a line chart to visualize the video count throughout the year 
  illustrates the channel's progressive growth over time.
  
  or
  
  Comparing a bar chart of total video count across years with the previous chart 
  suggests that fewer videos paired with higher viewer counts indicate a focus on quality over quantity,
                   ",
        color = "#FFFFFF",
        size = 25,
        font = "sans"
    )
frame6


template_animation = c(frame1, frame2, frame3, frame4, frame5, frame6)

data_story = image_animate(template_animation, delay=500)
data_story

save = image_write(data_story, "data_story.gif")